First of all, let’s examine the cars dataset. Display the first 6 rows.
## speed dist
## 1 4 2
## 2 4 10
## 3 7 4
## 4 7 22
## 5 8 16
## 6 9 10
This is the entire dataset.
## speed dist
## 1 4 2
## 2 4 10
## 3 7 4
## 4 7 22
## 5 8 16
## 6 9 10
## 7 10 18
## 8 10 26
## 9 10 34
## 10 11 17
## 11 11 28
## 12 12 14
## 13 12 20
## 14 12 24
## 15 12 28
## 16 13 26
## 17 13 34
## 18 13 34
## 19 13 46
## 20 14 26
## 21 14 36
## 22 14 60
## 23 14 80
## 24 15 20
## 25 15 26
## 26 15 54
## 27 16 32
## 28 16 40
## 29 17 32
## 30 17 40
## 31 17 50
## 32 18 42
## 33 18 56
## 34 18 76
## 35 18 84
## 36 19 36
## 37 19 46
## 38 19 68
## 39 20 32
## 40 20 48
## 41 20 52
## 42 20 56
## 43 20 64
## 44 22 66
## 45 23 54
## 46 24 70
## 47 24 92
## 48 24 93
## 49 24 120
## 50 25 85
\(\frac{a+b}{c+d}\)
\[\lim\limits_{x \to \infty} \exp(-x) = 0\]
Let’s investigate the size and dimensionality of our dataset.
## [1] 50
## [1] 2
The size is 50 and the dimensionality is 2.
Let’s investigate our dataset using a 5 number summary.
## speed dist
## Min. : 4.0 Min. : 2.00
## 1st Qu.:12.0 1st Qu.: 26.00
## Median :15.0 Median : 36.00
## Mean :15.4 Mean : 42.98
## 3rd Qu.:19.0 3rd Qu.: 56.00
## Max. :25.0 Max. :120.00
Let’s visualise our distributions using boxplots. This is using base R graphics.
This is a ggplot alternative.
cars %>%
ggplot(aes(y=speed)) +
geom_boxplot()
# geom_histogram(binwidth=5)
ggplotly()
cars %>%
ggplot(aes(y=dist)) +
geom_boxplot()
ggplotly()
This is base R code for scatterplot.
plot(cars)
This is a ggplot scatterplot
xy_plot <- cars %>%
ggplot(aes(x=speed, y=dist)) +
geom_point() +
theme_bw()
ggplotly(xy_plot)
round(cor(cars$speed,cars$dist),2)
## [1] 0.81